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Baudry B (Service des Enterobacteries, Unite INSERM 199, Institut Pasteur, 75724 Paris 
Cedex 15 France), M. Kaczorek and P. J. Sansonetti. Nucleotide sequence of the invasion 
plasmid antigen B and C genes (ipaB and ipaC) of Shigella f/exneri Microbial Pathogenesis 
1988- 4: 345-357. The nucleotide sequence of a 4.8 kilobase (kb) Hmd\\\ fragment from 
pWRlOO the virulence plasmid of Shigella f/exneri 5, was determined and analysed. This frag- 
ment encodes polypeptides b (62 kilodalton, kD) and c (43 kD) which have already been described 
as two of the four immunogenic polypeptides of Shige//ae. The nucleotide sequence revealed 
that in addition to the ipaB and ipaC genes encoding polypeptides b and c, a third complete 
open reading frame was found within the fragment. The gene, named ippl encoded a 17 kD 
polypeptide. The deduced amino acids sequence of polypeptides b and c showed no signal 
peptide but presence of highly hydrophobic domains compatible with a transmembraneous 
location. The surprising A and T richness of the three genes as compared with the Escherichia 
co/i and Shigella genomes, resulted in a biased codon usage, and raises the question of the 
origin of the sequences. 
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Introduction 

The pathogenic potential of Shigellae t the etiologic agents of bacillary dysentery, is 
-correlated with the ability of these bacteria to enter and multiply within colonic 
epithelial cells. 1 It has now been well established that expression of genes located 
both on the virulence plasmid and on the chromosome is required for full virulence. 2 " 6 
However, the plasmid by itself is sufficient to promote the entry of the bacteria into 
cells. 6 

Up to now, very few data have been published on proteins which could be involved 
in the invasive process. Nonetheless, seven plasmid-encoded polypeptides have been 
found to be specifically associated with invasive strains of Shigella flexneri. 7 Among 
these, four polypeptides, named a, b, c and d, were consistently recognized by sera 
from monkeys which had been infected with S. flexneri and represent the major 
proteinaceous antigens of Shigeliae. 1 Another study has shown that sera from 
children recovering from shigellosis contained antibodies directed against the same 
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Coquette, France, 
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Fig 1 Immunoblot of whole cell extracts reacted with serum from a monkey immunized against Shigella 
flexneri 2a (diluted 1/200). Lane 1, molecular weights (BRL); 2, M90T; 3, BS169/pHS5753. Exposure 6 
days. 

polypeptides. 8 Polypeptides b and c, with estimated molecular weights of 62 and 43 
kilodaltons (kD) respectively, appear to be the dominant antigens. 

We have recently reported the cloning 9 and the characterization by mutagenesis and 
subcloning 10 of a 45 kilobase (kb) fragment from pWR100, the virulence plasmid of 
S. flexneri serotype 5 strain M90T. This recombinant plasmid confers both the ability 
to invade HeLa cells, and the capacity to express the four immunogenic polypeptides. 9 
Study of insertion mutants revealed that low expression of the immunogenic poly- 
peptides b, c and d resulted in a dramatic decrease in the invasive ability of the mutant 
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Fig 2. Open reading frames (ORF) and restriction map of the 4.8 kb Hind\\\ fragment. Vertical lines 
represent the nonsense codons on the total length of the 4.8 kb segment. ORF corresponding to genes ippl 
ipaB and ipaC are indicated. Scale is in base pairs. 

strain, whereas a mutant which did not express polypeptide a was still invasive in the 
HeLa cell assay. 10 We have decided to focus on the study of polypeptides b and c 
because of their high immunogenicity which may have important implications for 
vaccine design. 

This manuscript reports the complete amino acid sequence of polypeptides b and c 
deduced from the nucleotide sequence of genes ipaB and ipaC. We also report the 
presence of an open reading frame (ORF) located upstream of ipaB, which encodes a 
1 7 kD polypeptide. The genetic organization of the genes is discussed. 



Results 

Expression of immunogenic polypeptides 

A 48 kb Hind\\\ fragment spanning the region encoding polypeptides b and c as 
predicted by insertion mutagenesis/ 0 was subcloned into the Hind\\\ site of plasmid 
pACYC184. 11 The resulting plasmid, pHS5753, was subsequently introduced into 
Shigella strain BS169. . 

To analyse the ability of the Hind\\\ fragment to direct synthesis of polypeptides b 
and c, whole cell extracts from BS169 carrying plasmid pHS5753 were analysed by 
immunoblotting using antiserum from a monkey orally immunized against S. flexneri 
2a Figure 1 shows that both polypeptides were detected, migrating with a relative 
molecular weight (M T ) of 62 and 43 kD respectively. Therefore, it is likely that the 4.8 
kb Hind\\\ fragment from the virulence plasmid pWR100 oTS: flexneri contains the 
ipaB and ipaC genes. 



Nucleotide sequence 

The nucleotide sequence of the 4.8 kb Hind\\\ fragment was completely determined 
on both strands. Surprisingly, in addition to the two expected open reading frames 
(ORF) for polypeptides b and c, a third ORF was found (Fig. 2). All three ORFs were 
on the same DNA strand, and therefore in the same orientation. The ORFs were 485 
bases pairs (bp) ("i"), 1 802 bp ("b") and 1 1 08 bp ("c") long. The complete nucleotide 
sequence of these three ORFs is represented in Fig. 3. 

Furthermore, the nucleotide sequence showed the presence, on the same strand, of 
two other truncated ORFs located at each end of the cloned fragment (Fig. 2). At the 
5' end region, is a 547 bp long ORF which probably starts upstream of the Hind\\\ 
cloning site. At the 3' region of the 4.8 kb fragment, an ORF 842 bp long is interrupted 
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Fig. 3. Nucleotide sequence of the ippl ipaB and /paC genes and deduced P^ d ^"^^ 
corresponding polypeptides i, b, and c. Arrows correspond to inverted repeats^ound in the nuc eot.de 
sSST^^^ arrows represent direct repeats. Possible RBS are underlined, and putat.ve -10 
and "-35" sequences are signaled with dots. 

by the other HindUl cloning site. Nucleotide sequences of these two truncated ORFs 

are not shown. .... 

In addition to the Sal I site previously mapped, 10 a search for restriction endonuclease 
sites within the sequence revealed the presence of a second Sal\ site which had not 
been detected by agarose gel electrophoresis when the physical map of pHS41 08 was 
determined. The Saf\-Saf I fragment generated was 383 bp long and was easily 
detected by electrophoresis on polyacrylamide gel (data not shown). 

Features of the nucleotide sequence 

The G+C percentage, calculated either on the total DNA sequence, or on each 
complete ORF, was 37%. 



DNA sequel 



t v c_ * 

ACTCTTCCTAAJ 



I T M A 
ATCACTAATCC 



S A I < 
ICTCCCACAO 



t A I I 

caagcaatac 



TCATACAAA1 



q Q I 
tcaccacat* 



TCCTCCCCT 



L S 
CCTATCAT 



Q S 
TCAATCTC 



P S 
CCCACATV 



Q * 

CCAAAAA 



s s 

CACCACt 



A S 
ACCCAC 



Eac 
termi; 
and ( 
prese 
Th 
prect 
Dalg 
Esch 
sites 
the i 
shot 
dalt( 
seqi 
T 
resp 
phv 




^Jc^Lc^^ 

" 00 * - S S T K F L C * N K L A 

Fig. 3- — continued 
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Fig. 4. Detail of the sequence showing possible loop structures at the 3' end of the /paB gene. Direct 
repeats are underlined with arrows. TGA termination codon of the ipaB gene is underlined with dots. The 
RBS sequence (box) and the initiation codon (underlined) oiipaC gene are represented. Thick lines show 
the position of the putative "-10" and "-35" sequences upstream of ipaC gene. A Go at 25*C of the loop 
structures is indicated and was calculated according to Cantor and Schimmel. 24 

ORF "c". The ORF "i" was associated with the newly identified polypeptide i (see 
below: "identification of the ippi product") and has been named ipp\ (for invasion 
plasmid polypeptide). 

According to taxonomic studies, Shigellae are very closely related to £ co//. 13 
Moreover, cloned Shigelia genes are usually easily expressed in £ co//. 14 Therefore, 
similarities between transcriptional and translational signals from Shigelia and £ coii 
were expected. Upstream of the ipaC gene possible "-10" and "-35" regions were 
found. At position -96 from the ATG, a TATTAG sequence which is related to the 
"-10" £ coli consensus sequence TATAAT, 15 and at position -114 from the ATG, a 
TTGCAG sequence similar to the "-35" consensus sequence TTGACA of £ cofp 5 
were found (Fig. 3). The distance of 1 6 bp between these putative "-10" and "-35" 
sequences is comparable to the 1 7 bp found in the £ coii promoter regions. Upstream 
of the ipp\ gene, a putative "-10" sequence, TATATT, was found at position -45 
from the predicted initiation codon ATG, but no sequence related to the "-35" 
consensus sequence could be found. No sequences related either to the "-10" or 
"-35" promoter consensus sequences could be found within the 5' non-coding region 
of/paB. 

Direct and inverted repeats, some of which are represented on Fig. 3, were found 
on the sequence. Figure 4 shows part of the sequence corresponding to the 3' end of 
ipaB, which_also contains the putative promoter sequence for the ipaC gene. Within 
this 100 Bp segment a direct repeat, which comprises the possible "-35" of ipaC, 
and two inverted repeats, which could form stable loops, were found (Fig. 4). The 
putative "-10" sequence of the ipaC gene was located within one of these inverted 
repeats. Presence of such potential secondary structure within a promoter region has 
been previously reported for some genes regulated at the transcriptional level. 16 

On the complementary strand, another ORF was detected, which was 591 bp long. 
However, the first ATG was located in the middle of the ORF, 253 bp downstream. 
This ATG was also preceded by a putative RBS which was similar to the sequence 
previously seen in front of the other genes. 
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identification of the ipp/ gene product 

Polyacrylamide gel electrophoresis of [ 35 S] -methionine labeled proteins from minicells 
containing pHS5753 was performed to detect the product of the ipp\ gene 
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Table 1 Comparison between the codon usage of E. coli genes and the ORF found 
within the Hind\\\ fragment. Codon usage of E. coli is from 17 . Bold and underlined 
numbers respectively emphasize particularly high or low frequencies of utilization by 
the ipp\, ipaB and ipaC Shigella genes as compared with £ coli genes. 
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the high A+T content of the genes, codon usage is quite different from that of E. coli. 
The frequencies show that codons ending in A or U are largely preferred (Table 1). A 
striking example is the AGA codon for arginine which is used at a frequency of 52.2% 
in these Shigella genes, and only 1 .2% in E. coli. 



Features of the polypeptides 

The amino acid composition deduced from the sequence was rather similar for the 
three polypeptides. Briefly, polypeptide i contained 1 .94% cysteine and all amino acids 
except tryptophan, polypeptide b contained all residues but only one cysteine (0.1 1%), 
and polypeptide c contained neither tryptophan nor cysteine. 
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polypeptides b and c are indicated by an arrow. 



The hydropathy profile of each polypeptide was calculated by using the HYDROPLOT 
oroqram of Kyle an d Doolittle* and is shown on Fig. 6. The hydropathy prof, le of 
pZeptide Showed no particular features. On the other hand, polypeptides b and c 
oo ^apCred to contain a hydrophobic domain within the centra, part o the molecule 
In polypeptide b ( the. hydrophobic domain, approximately 1 20 am.no ac.ds Mong was 
closely preceded by a very hydrophilic region of 180 ammo ac.ds. In the case of 
^CpSdTa the ca. 60 amino acids long hydrophobic doma.n was '"imed.ate.y 
Slowed by an approximately 1 1 0 residues long region wh.ch was hydroph.l.c. 
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Discussion 

In this study, we have determined the nucleotide sequence of a 4.8 kb HindM fragment 
from the recombinant plasmid P HS4108. 910 An immunoblot confirmed that th,s 
fragment contained the ipaB and ipaC genes, encoding immunogenic polypepfdes b 
and c respectively, as predicted by the analysis of insertion mutants. 

The 4 8 kb Sequence consisted of five ORFs. with very short spaces m bettveen. 
Thlee of these^ading frames were entirely contained within ^^^^ 
two others were truncated by the ends of the cloned segment. A 6 or 7 bp sequence 
SmpTementary to the 3' end of E. coli 1 6S rRNA was found close to the 5' end of each 
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ORF. After this, the first following ATG codons were designated as the initiation codons. 

Based upon ORF size criteria and previously obtained data on the position of the 
genes encoding the immunogenic polypeptides,, the 1802 bp long ORF "b", and the 
1178 bp long ORF "c" were attributed to genes ipaB and ipaC respectively. ORF "i" 
could encode a 17 kD polypeptide which had never been identified. Study of the 
products of pHS5753 indeed confirmed that a ca. 17 kD polypeptide was expressed. 
This polypeptide has been named i, and the corresponding ORF ipp\. However, no 
insertion mutants have ever been found within this gene, therefore, we do not know if 
it is involved in the invasion process. 

Though promoter related sequences could not be very easily predicted, possible 
"-10" and "-35" sequences were found upstream of ORF "c", and interpreted as a 
putative promoter for the ipaC gene. No promoter-related sequences could be found 
in front of the gene ipaB. Though this fact does not necessarily mean that no promoter 
exists in front of /paB, the possibility of a co-transcription of ipp\ and /'paB could be 
considered and would raise a singular point. Indeed, it was observed on the sequence 
that the TAA codon marking the end of translation of polypeptide i was just preceded 
by the probable R BS for the translation of polypeptide b. The very short space between 
the two coding sequences (22 bp) and the absence of potential promoter- related 
sequences suggest that ipp\ and /paB may be transcribed on the same operon. This 
overlapping of translational (termination and initiation) signals has already been 
observed on the sequence of the operons encoding Vibrio choierae enterotoxin, 18 - 19 
and £ coii Shiga-like toxin. 20,21 In both cases, the overlapping features are thought to 
act as some sort of translation regulatory structure, responsible for the 1 to 5 ratio 
observed between the products encoded by the two genes. 

Until now, polypeptides b and c have been considered to be present in the membrane, 
or in the periplasmic space, because enhanced signals could be seen on immunoblots 
of membrane preparations and because antiserum directed against b and c could be 
obtained by injecting rabbits with water extracts of Shigella. However, the hydropathy 
profiles of these polypeptides revealed no signal peptide structures which could be 
expected for secreted proteins. 22 On the other hand, both polypeptides contained a 
large internal hydrophobic segment that could be an intra-membraneous region. 

Interesting results were obtained by comparing codon usage of the three genes and 
with that of £ coli genes. The A+T richness of the sequenced fragment resulted in a 
bias in codon usage: codons with an end in A or U were largely favoured. Up to now, 
only one gene of Shigella ' has been sequenced. 23 This gene, wVF, was cloned from the 
220 kb virulence plasmid of S. fiexneri 2a. The nucleotide sequence of virF has revealed 
the same A+T richness and a preponderant use of codons ending in A or U as in the 
genes of the W/?</l ^fragment. Although up to now the total length of sequenced 
segments represents" only 7 kb out of the 220 kb virulence plasmid of Shigellae, it is 
very likely that these differences are conserved among the other virulence genes of 
the plasmids. These results raise interesting questions about the origin of the virulence 
genes carried by the plasmid of Shigellae. Furthermore, the difference of composition 
of the DNA may possibly explain the general instability of the virulence plasmid, and 
the high instability of the sequences responsible for invasion. The biased codon usage, 
resulting in utilization of rare tRNAs, might also explain why the polypeptides involved 
in invasion are so weakly produced. 

Determination of the nucleotide sequence of the /paB and ipaC genes opens new 
fields of research on the molecular mechanism of the virulence of Shigellae. Moreover, 
based on the amino acid sequence, studies of the immunogenic domains and antigenic 
epitopes of polypeptides b and c will be undertaken, and will be helpful in the search 
for a vaccine against shigellosis. 
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Table 2 Bacterial strains and plasmids 



Strain or 
plasmid 



S. flexneri 5 
M90T 

S. flexneri 2a 
BS169 
BS213 
E. coli 
JM101 

pWFMOO 
pHS4108 

pACYC184 



Characteristics 



Plasmid content 



Reference 



wild type 



Mal + X*gafU::Tn10 

X papa lysogen of BS1 69 

A lac pro, supE, thi, F' traD36, proAB. iac\* 
ZAM15 

virulence plasmid from strain M90T 
recombinant plasmid containing a 45 kb insert 
from pWR100 
cloning vector Tc r Cm r 



pWR100 




2 small cryptic plasmids 


5 


2 small cryptic plasmids 


9 


2 small cryptic plasmids 


this study 


none 


26 




5 




9 




11 



Materials and methods 

Bacteria/ strains, plasmids and media. Bacterial strains and plasmids are listed in Table 2 
Bacteria were routinely grown in L broth. For transfection and production of single stranded 
templates, strain JM101 was grown in SOB (2% tryptone, 0.5% yeast extract 10 mM .NaCI, 2 5 
mM KCI, 20 mM MgS0 4 pH 6.8-7) and 2YT (1.6% tryptone, 1.6% yeast extract, 0.5% NaCI), 
respectively. 

M13 cloning. Isolation of plasmid, purification and modification of DNA fragment .DNA 
ligation and transformation were carried out as described in Maniatis etal™ Plasmtd pHS5753 
was sonicated, treated with the Klenow fragment of DNA polymerase I (Genofit) in the presence 
of deoxyribonucleotides (Boehringer) for 16 h at 16X, then fractionated by agarose gel 
electrophoresis. Fragments of 200-500 bp were electro-eluted and purified by chrornatography 
on DEAE Sephacell (Pharmacia). DNA was ethanol- precipitated, retreated with the Klenow 
fragment of DNA polymerase I (Genofit) and T4 DNA polymerase (BRL), ligated to dephos- 
phorylated Smal-cleaved M13 mp8 RF DNA (Amersham) using T4 DNA ligase (Biolabs) for 
16 h at 16°C and transfected into Escherichia coli strain JM101 . Transfected bacteria were then 
spread on an agar medium in the presence of 5-bromo-4-chloro-3-indolyl-0D-galacto- 
pyranoside(X gal). 

Screening of M1 3 recombinants. White M1 3 clones containing a DNA insert were replicated 
onto filters and screened by colony hybridization using plasmid vector pACYC184 labelled with 
32 P as a probe. Phages that did not hybridize with this probe were selected for sequencing. 

Sequencing technique and computer analysis. Preparation of single stranded DNA from 
individual plaques was performed as described by Messing. 26 The DNA sequence ^determined 
by the dideoxy chain termination procedure 25 using 2'-deoxyadenosine 5'-[ S][-thio] tri- 
phosphate (Amersham, 400 Ci/mmol) and buffer gradient gel. 28 Sequences were compiled and 
analysed usinfc the programs of Staden 29 " 31 adapted by B. Caudron for the MV8000 computer 
of the Institut Pasteur Computer Center. Hydrophobicity profiles were calculated by the method 
of Kyte and Doolittle. 32 

Immunoblots. Whole bacterial extracts from BS169/pHS5753 were run on 0.1% sodium 
dodecyl sulfate (SDS)-12.5% polyacrylamide gels and blotted onto nitrocellulose filters as 
described by Burnette. 33 The protein loaded filters were treated as described by Fisher ef al 
Diluted (1/250) convalescent serum from a monkey which had been orally infected with S. 
flexneri 2a was used to detect expression of immunogenic polypeptides. 

Analysis of proteins expressed in minicells. Purification of minicells from 14 h L broth cultures 
was accomplished by differential centrifugation and three sucrose density gradient separations. 
Purified minicells were labelled for 1 h with [ 35 S] methionine (50 jiCi/ml. 800 Ci/mmol. 
Amersham) After washing, minicells were solubitized and extracts were run on 0.1% SDS- 



356 



B. Baudry et al. 



DNA s( 



12.5% acrylamide PAGE. 36 Fixation and fluorography of dried gels was performed as previously 
described. 37 

We wish to thank Thierry Gamier for his assistance in the utilization of the computer programs, 
and Armelle Phalipon and Catherine Gelin for their help. B.B. was supported by a fellowship 
from the "Ministere de la Recherche et de I'Enseignement Superieur". 
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